Comparison between multistage filters and sketches for finding heavy hitters
نویسنده
چکیده
The purpose of this write-up is to compare multistage filters [3] and sketches with respect to their ability to identify heavy hitters. In a nutshell, the conclusion is that multistage filters as I use them identify heavy hitters with less memory than sketches, but some sketches support important other operations, more specifically they can be added and subtracted without any need to re-read the data stream(s). Both multistage filters and sketches work in the streaming model: data items with various identifiers make up a stream of traffic or updates that the data structures operate on. Both hash these updates to counters based on the identifiers. The main difference between sketches and multistage filters is that sketches see these counters as a summary of the traffic that can be used for many operations while filters use these counters only as a means for identifying the heavy hitters. After discussing the differences in more detail, I will compare the memory usage of these solutions for the heavy hitter problem. Since there are many types of sketches and there are many things they are used for, I am going to focus on three papers that use sketches for detecting heavy hitters. [1] is a widely quoted paper that addresses exactly the problem of finding heavy hitters. [2] is a more recent paper that proposes improved sketches for finding heavy hitters. [4] applies sketches to detect big changes in network traffic, which is related to but different from finding heavy hitters. One thing worth pointing out is that many of the sketch papers papers (and all three discussed here) are concurrent with or published after my paper on multistage filters [3].
منابع مشابه
A Reversible Sketch Based on Chinese Remainder Theorem: Scheme and Performance Study
In recent times, sketch based techniques are emerging as useful data stream computation techniques towards processing massive data. In many applications, finding heavy hitters and heavy changers is essential and this task demands reversibility property of sketches. Continuing the trend of arriving at newer reversible sketch, this paper presents a scheme based on Chinese Remainder Theorem. The s...
متن کاملNew Algorithms for Heavy Hitters in Data Streams
An old and fundamental problem in databases and data streams is that of finding the heavy hitters, also known as the top-k, most popular items, frequent items, elephants, or iceberg queries. There are several variants of this problem, which quantify what it means for an item to be frequent, including what are known as the `1-heavy hitters and `2-heavy hitters. There are a number of algorithmic ...
متن کاملNew Algorithms for Heavy Hitters in Data Streams (Invited Talk)
An old and fundamental problem in databases and data streams is that of finding the heavy hitters, also known as the top-k, most popular items, frequent items, elephants, or iceberg queries. There are several variants of this problem, which quantify what it means for an item to be frequent, including what are known as the l1-heavy hitters and l2-heavy hitters. There are a number of algorithmic ...
متن کاملBlock Heavy Hitters
We study a natural generalization of the heavy hitters problem in the streaming context. We term this generalization block heavy hitters and define it as follows. We are to stream over a matrix A, and report all rows that are heavy, where a row is heavy if its `1-norm is at least φ fraction of the `1 norm of the entire matrix A. In comparison, in the standard heavy hitters problem, we are requi...
متن کاملParallel mining of time-faded heavy hitters
We present PFDCMSS, a novel message–passing based parallel algorithm for mining time–faded heavy hitters. The algorithm is a parallel version of the recently published FDCMSS sequential algorithm. We formally prove its correctness by showing that the underlying data structure, a sketch augmented with a Space Saving stream summary holding exactly two counters, is mergeable. Whilst mergeability o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004